Habitat-based radiomics enhances the ability to predict lymphovascular space invasion in cervical cancer: a multi-center study

Introduction Lymphovascular space invasion (LVSI) is associated with lymph node metastasis and poor prognosis in cervical cancer. In this study, we investigated the potential of radiomics, derived from magnetic resonance (MR) images using habitat analysis, as a non-invasive surrogate biomarker for predicting LVSI in cervical cancer. Methods This retrospective study included 300 patients with cervical cancer who underwent surgical treatment at two centres (centre 1 = 198 and centre 2 = 102). Using the k-means clustering method, contrast-enhanced T1-weighted imaging (CE-T1WI) images were segmented based on voxel and entropy values, creating sub-regions within the volume ofinterest. Radiomics features were extracted from these sub-regions. Pearson correlation coefficient and least absolute shrinkage and selection operator LASSO) regression methods were used to select features associated with LVSI in cervical cancer. Support vector machine (SVM) model was developed based on the radiomics features extracted from each sub-region in the training cohort. Results The voxels and entropy values of the CE-T1WI images were clustered into three sub-regions. In the training cohort, the AUCs of the SVM models based on radiomics features derived from the whole tumour, habitat 1, habitat 2, and habitat 3 models were 0.805 (95% confidence interval [CI]: 0.745–0.864), 0.873(95% CI: 0.824–0.922), 0.869 (95% CI: 0.821–0.917), and 0.870 (95% CI: 0.821–0.920), respectively. Compared with whole tumour model, the predictive performances of habitat 3 model was the highest in the external test cohort (0.780 [95% CI: 0.692–0.869]). Conclusions The radiomics model based on the tumour sub-regional habitat demonstrated superior predictive performance for an LVSI in cervical cancer than that of radiomics model derived from the whole tumour.


Introduction
Cervical cancer is one of the most prevalent gynaecological malignancies worldwide, ranking fourth in cancer incidence among women (1).In 2020, approximately 110,000 new cases of cervical cancer were diagnosed in China alone, representing 18% of the new cases of cervical cancer worldwide (2).In some developing countries, the prevalence and mortality rates of cervical cancer surpasses those of breast cancer (3,4).In cervical cancer, lymphovascular space invasion (LVSI), the infiltration of tumour cells into the blood and lymphatic vessels, is closely associated with lymph node metastasis and serves as an independent risk factor for prognosis (5)(6)(7).According to the 2018 International Federation of Gynecology and Obstetrics (FIGO) staging and treatment guidelines, the treatment decision for patients with stage IA1 cervical cancer should take into account the LVSI status.Patients with LVSI-positive lesions should undergo adjuvant chemoradiotherapy or additional radical resection and lymph node dissection surgery to suppress the spread of lymph node micrometastases and improve prognosis (8).Therefore, determining LVSI status is important for making treatment decision, especially in women of childbearing age who wish to preserve fertility.
Considering the high heterogeneity of the malignancies, tumours exhibit diverse microenvironments and microstructures (9)(10)(11).Radiomics, which involves extracting numerous features from medical images to classify diseases using machine-learning techniques, offers the potential to deliver personalised medicine in an no-invasive manner.Traditional radiomic analysis typically focuses on the whole tumour and overlooks the sub-regional phenotypic variations within the tumour (12).Recently, a new approach called habitat, which divides tumours into sub-regions by identifying grayscale voxels with comparable imaging characteristics (12,13), has shown the potential in improving the ability to distinguish between tumour heterogeneity (14)(15)(16).In this study, we intended to extract radiomic signatures from different sub-regions of cervical cancer using contrast-enhanced T1weighted imaging (CE-TIWI) with habitat analysis to decode the LVSI status, thereby facilitating personalised therapeutic decision making.

Materials and methods
This study was approved by two medical ethics committees that conducted ethical reviews and waived the requirement for obtaining patient consent.

Patient population
We recruited 300 patients with pathologically confirmed cervical cancer, who underwent pelvicmagnetic resonance (MR) imaging within 1 month before surgery and without any antitumour therapy before MR.Among them, 198 patients from centre 1 constituted the training cohort, whereas the remaining 102 from centre 2 constituted the external test cohort.We collected and organised two distinct datasets of MRI images from female patients diagnosed with cervical cancer using a picture archiving and communication system.The training cohort comprised 104 LVSIpositive and 94 LVSI-negative patients and the external test cohort comprised 54 LVSI-positive and 48 LVSI-negative patients.We retrospectively analysed clinical data and laboratory indicators, including age, maximum tumour diameter, histological classification, degree of cellular differentiation, FIGO stage, CA125 and CA199 levels, squamous cell carcinoma antigen, and human papillomavirus infection status.The inclusion criteria for the study population were as follows: 1) patients who underwent pelvic MRI before surgery and 2) LVSI confirmed by postoperative pathological examination.The exclusion criteria were as follows: 1) pregnant women; 2) those who underwent cervical conization or loop electrosurgical excision; 3) those who had a history of radiotherapy or chemotherapy before the MRI examination; and 4) those with blurry diagnostic images.

MRI protocols
The scanning protocol and parameters are included in the Supplementary Material.The CE-TIWI images were downloaded from the picture archiving and communication system and transferred to a personal computer.Two radiologists, each with more than 5 years of experience in pelvic diagnosis, segmented the tumours layer-by-layer on the CE-TIWI images using the opensource software ITK-SNAP (version 3.6, www.itk-snap.org)to obtain the volume of interest (VOI) with the aid of diffusion weighted image (DWI).After 1 week, 30 sets of CE-TIWI images were randomly selected, and the outlining process was repeated.Features with intraclass correlation coefficients (ICC) value of<0.75 were retained for screening.Any differences in the outlining process were resolved by a radiologist with over 15 years of experience.The two radiologists were blinded to the patients' pathological diagnoses during the outlining process.A flowchart illustrating this process is presented in Figure 1.

VOI delineation and sub-region clustering
Habitat utilises voxel and entropy values derived from MR images to cluster VOIs into sub-regions (17)(18)(19).The voxel counts for each tumour VOI were determined using a traditional method, whereas the entropy values were computed for each layer of the MR images using the following formula: The k-means method was employed to cluster the VOI regions at the patient level, forming multiple habitats, and the distance correlation between samples was calculated using the Euclidean distance (voxel values and entropy values).The number of habitats was tested from 2 to 10, and the optimal k-value was selected using the Consensus Cluster Plus method, which evaluated the consistency of clustering features by resampling multiple voxels in the cluster 1000 times in 80% of the samples to select the k-value corresponding to a well-separated and stable cluster.The optimal k-value served as the criterion for selecting the optimal number of clusters at the patient population level (Figure 2).The optimal k-value was found to be 3.Using the OnekeyAI platform, we imported each patient's VOI into the platform's components and classified the cervical cancer tumours into three classes named habitat 1, habitat 2, and habitat 3.

Feature selection and model development
To account for differences in imaging features caused by variations in the reconstruction layer thickness and pixel size, the images were resampled to 1×1×3 m^3 and normalized to a grayscale range of 0-255.Features were independently extracted from each of the four habitats, habitat 1, habitat 2, habitat 3 and the whole tumour using the PyRadiomics program package (20), which adheres to the imaging biomarker standardization initiative (21).Before the feature extraction, two filters, wavelet and log-sigma, were implemented to enhance the process, facilitating the extraction of various types of features, including first-order, shape, gray-level co-occurrence matrix, gray-level size zone matrix, gray-level run length matrix, neighbouring gray-tone difference matrix, and graylevel dependence matrix.
First, the features with ICC<0.75 were screened, and imaging histology features of different dimensions were subjected to Z-score processing, normalizing the data to mean of 0 and variance of 1.
After normalising all the data, the correlation between features was calculated using the Pearson correlation coefficient.When the correlation exceeded 0.9, only one feature was retained between any two highly correlated features.Finally, the remaining features in the training dataset were filtered using the least absolute shrinkage and selection operator regression model.
A support vector machine (SVM) classification model was developed in the training cohort based on features extracted from habitat 1, habitat 2, habitat 3 and the whole tumour with five-fold cross-validation and finally validated in an external test cohort.

Statistical analysis
Clinical characteristics were compared using the chi-square test or Fisher's exact test for categorical variables and the t-test or Mann-Whitney U test for continuous variables.
The predictive performance of the models for LVSI in cervical cancer was evaluated using the area under curve (AUC) of the receiver operating characteristic curve.The accuracy, sensitivity, specificity, positive predictive value, and negative predictive value were calculated.The model with the highest AUC was validated using an external test cohort.The generalisation of the model was assessed using the Delong test to compare the predictive performance of the training and test cohorts as well as the calibration curves.Ultimately, net benefit of the model's clinical usefulness was measured using the decision curve analysis.Statistical significance was set at P< 0.05.Flowchart showing the habitat analysis process.

Clinical characteristics
mean age of 51.48 ± 10.63 and 50.35 ± 9.77 years for the training and validation cohorts, respectively, were included in the study.Among them, 161 cases were classified as FIGO stage I, 105 cases as stage II, and 34 cases as stage III.Squamous cell carcinoma was present in 226 cases, adenocarcinoma in 54 cases, and adenosquamous carcinoma in 20 cases.Significant statistical differences were observed in maximum diameter, degree of cellular differentiation, CA125 levels, and FIGO stage within the training cohort.Maximum diameter, CA125 levels, and FIGO stage also demonstrated significant statistical differences within the validation cohort.Other clinical characteristics, including the difference between LVSI+ and LVSI-groups, did not show statistically significant differences in both training and external testing cohorts.

Feature selection
A total of 1016 histological features were extracted from the imaging data based on habitat 1, habitat 2, habitat 3, and the whole tumour.After screening the features using ICC values<0.75, the remaining number of imaging histological features based on habitat 1, habitat 2, habitat 3, and the whole tumour were 713, 617, 692, and 627, respectively.Pearson correlation coefficients were used for filtering, resulting in 190, 148, 170, and 155 features remaining for habitat 1, habitat 2, habitat 3, and the whole tumour, respectively.The remaining imaging histological features of the training cohort were screened using the least absolute shrinkage and selection operator regression method for model building, yielding 19, 18, 19, and 7 best imaging histological features based on habitat 1, habitat 2, habitat 3, and the whole tumour, respectively.These results are presented in the Supplementary Materials.

Performance evaluation of radiomics based on habitat imaging
We developed SVM machine learning models based on the most distinctive imaging histological characteristics of habitat 1, habitat 2, habitat 3, and the whole tumour.The prediction efficiency of each model is summarized in Table 2. Figure 3 illustrates the receiver operating characteristic curves of the SVM machine learning models, with area under the curves (AUCs) of 0.805 (95% confidence interval [CI]: 0.745-0.864),0.873 (95% CI: 0.824-0.922),0.869 (95% CI: 0.821-0.917),and 0.870 (95% CI: 0.821-0.920)for habitat 1, habitat 2, habitat 3, and the whole tumour, respectively.The external test cohort had AUCs of 0.629 (95% CI: 0.519-0.739),0.683 (95% CI: 0.577-0.789),0.649 (95% CI: Based on the area change under the conditional density function curve.We observed that clustering separation was optimal at a k value of 3 (A, B).This value corresponded to a sharp decrease in the area change under the receiver operating characteristic curve, which suggested that after this k value, further improvements in separability were negligible.
0.540-0.757)and 0.780 (95% CI: 0.692-0.869)for habitat 1, habitat 2, habitat 3, and the whole tumour, respectively.The habitat 3 model demonstrated superior performance than that of the whole tumour model in the external test cohort.Figure 4 displays the calibration curves for the training and validation cohorts, showing better calibration for both groups.Figure 5 presents the decision curve analysis curves for the external validation cohort of the model, with significant net gains observed for the habitat 3-based SVM model.Thus, the clinical importance of our model for early cervical cancer diagnosis was highlighted.Figure 6 presents the feature weight map and confusion matrix of the habitat 3 imaging histological model.The Delong test revealed statistically significant differences between habitat 3 and the whole tumour models in both the training and validation cohorts.

Discussion
In this study, three sub-regions were delineated based on voxel and entropy values from contrast-enhanced T1-weighted imaging (CE-T1WI) of cervical cancer using habitat analysis, which is a heterogeneous metric.The SVM models based on the three habitat sub-regions exhibited a higher predictive performance for LVSI in cervical cancer than those derived from the whole tumour.Notably, the highest AUC of 0.870 (95% CI: 0.821-0.920)was derived from habitat 3, and this performance was robust across different centres (the AUC of the model in the external test cohort was 0.780 (95% CI: 0.692-0.869),and the difference between the training and external test cohorts was not statistical significant.The performance of the models in predicting LVSI was compared, and we observed that the prediction models built based on habitat 3 outperformed conventional overall tumour model in the training and external test cohorts with an AUC of 0.780 (95% CI: 0.692-0.869).This indicated that the tumour sub-regional radiomics model based on habitat analysis could enhance LVSI prediction in cervical cancer.Cervical cancer primarily metastasizes through blood or lymphatic vessels to other body tissues (22).Previous studies have indicated that the presence of LVSI implies a higher risk of lymph node metastasis and a greater probability of lymph node micrometastasis when LVSI is positive (23).LVSI is widely recognised as a risk factor for cervical cancer and directly affects the prognosis of patients with cervical cancer (24).The treatment of cervical cancer varies according to the stage and the presence of LVSI in patients with clinical stage IA (8).In the absence of LVSI, cervical conization alone is necessary to avoid radical hysterectomy.Therefore, the preoperative evaluation of LVSI is essential (25-27).
In the final analysis, we included 300 patients.In the training and validation cohorts, the difference between FIGO staging and LVSI status was statistically significant.The probability of LVSI occurrence increased from 42.86% (69/161) in stage IB to 58.1% (61/105) and 82.35% (28/34) in stages II and III, respectively, suggesting a greater that the probability of LVSI occurrence increased progressively with the advancing stage of cervical cancer.In our study, squamous cell carcinoma was present in 226, adenocarcinoma in 54, and adenosquamous carcinoma in 20 patients.The difference between the histological type and LVSI in the training and validation cohorts was not statistically significant, thus indicating that the histological type of cervical cancer did not affect the occurrence of LVSI in patients with cervical cancer (28).
Compared to whole-tumour radiomics, habitat imaging, an approach focused on sub-region imaging omics analysis, offers better quantification of tumour sub-regions that are more relevant to tumour growth or invasiveness (15).Invasive subregions have been reported to be important for prognosis and treatment response (29,30) The ROC curves of the SVM machine learning models in the training (A) and external test cohorts (B).cancer and extracted radiomic features to establish a breast cancer habitat risk score that could accurately categorise patients into highrisk and low-risk groups (32).Choi et al. used multi-parametric MR to extract radiomic features from multiple habitats of the tumour and identified three different subtypes through consistent clustering, revealing different phenotypic subtypes of glioblastoma with clinical and genomic significance.This approach highlights the potential of radiomics as a prognostic biomarker by using multihabitat imaging (33).
In this study, we employed CE-T1WI images to conduct a clustering analysis, enabling the effective evaluation of blood perfusion in the body by displaying vascular density and perfusion.Additionally, we measured the volume transfer constant, which relied on the permeability of tumour blood vessels (34).This approach provided more discriminatory information for predicting LVSI invasion in cervical cancer.Our prediction results indicated that the radiomics model based on habitat3 outperformed the whole tumour in both the training and external validation sets.The heterogeneous nature of solid tumours suggested that LVSI in cervical cancer might not be distributed uniformly and could exhibit variations at the microscale voxel level.After clustering the image voxels and entropy values, habitat3 was The decision curve analyses of the radiomic model in external test cohort.The Habitat3-based SVM model achieved a great net effect.

A B
The calibration curves in the training (A) and external test cohorts (B).
observed to contain more LVSI information, whereas the whole tumour comprised complete heterogeneous information.Our utilisation of habitat, a novel technology for clustering solid tumours in preoperative imaging and subsequently extracting radiomic features from the clustered tumour sub-regions, helped to avoid the inclusion of irrelevant areas that are not related to LVSI in cervical cancer in the feature extraction process, thereby improving the model's predictive performance.This study had some limitations.First, although this study included a larger number of patients than that of previous studies, a larger prospective dataset will be required to further improve the model's performance.Second, the diversity in the settings of multi-centre MR devices could have introduced variability in MR images due to differences in equipment and scanning parameters.Thus, we made efforts to standardise and normalise the images as much as possible to eliminate the effect of equipment-related differences.
In conclusion, the sub-region-based approach could predict the LVSI status in cervical cancer demonstrating superior performance over traditional radiomics of the whole tumour, thus making it a promising non-invasive biomarker for predicting preoperative LVSI, especially in patients with cervical cancer.The external test cohort demonstrated the model's stable performance with a strong AUC.

Table 1
presents the clinical characteristics of patients with cervical cancer.A total of 300 patients from two centres, with the

TABLE 1
Characteristics of cervical cancer patients in training and external test cohorts.

TABLE 2
LVSI prediction performance of SVM model.